35 research outputs found

    Performance Evaluation of cuDNN Convolution Algorithms on NVIDIA Volta GPUs

    Get PDF
    Convolutional neural networks (CNNs) have recently attracted considerable attention due to their outstanding accuracy in applications, such as image recognition and natural language processing. While one advantage of the CNNs over other types of neural networks is their reduced computational cost, faster execution is still desired for both training and inference. Since convolution operations pose most of the execution time, multiple algorithms were and are being developed with the aim of accelerating this type of operations. However, due to the wide range of convolution parameter configurations used in the CNNs and the possible data type representations, it is not straightforward to assess in advance which of the available algorithms will be the best performing in each particular case. In this paper, we present a performance evaluation of the convolution algorithms provided by the cuDNN, the library used by most deep learning frameworks for their GPU operations. In our analysis, we leverage the convolution parameter configurations from widely used the CNNs and discuss which algorithms are better suited depending on the convolution parameters for both 32 and 16-bit floating-point (FP) data representations. Our results show that the filter size and the number of inputs are the most significant parameters when selecting a GPU convolution algorithm for 32-bit FP data. For 16-bit FP, leveraging specialized arithmetic units (NVIDIA Tensor Cores) is key to obtain the best performance.This work was supported by the European Union's Horizon 2020 Research and Innovation Program under the Marie Sklodowska-Curie under Grant 749516, and in part by the Spanish Juan de la Cierva under Grant IJCI-2017-33511Peer ReviewedPostprint (published version

    Interfície pel comandament teleoperat d'útils quirúrgics

    Get PDF
    L'objectiu d'aquest projecte de final de carrera consisteix en el disseny d'una interfície capaç de sensoritzar els moviments d'un instrument quirúrgic durant una intervenció laparoscòpica. Aquest sistema permetrà comandar l'element terminal d'un robot en teleoperació

    Interfície pel comandament teleoperat d'útils quirúrgics

    Get PDF
    L'objectiu d'aquest projecte de final de carrera consisteix en el disseny d'una interfície capaç de sensoritzar els moviments d'un instrument quirúrgic durant una intervenció laparoscòpica. Aquest sistema permetrà comandar l'element terminal d'un robot en teleoperació

    Direct Inter-Process Communication (dIPC): Repurposing the CODOMs architecture to accelerate IPC

    Get PDF
    In current architectures, page tables are the fundamental mechanism that allows contemporary OSs to isolate user processes, binding each thread to a specific page table. A thread cannot therefore directly call another process's function or access its data; instead, the OS kernel provides data communication primitives and mediates process synchronization through inter-process communication (IPC) channels, which impede system performance. Alternatively, the recently proposed CODOMs architecture provides memory protection across software modules. Threads can cross module protection boundaries inside the same process using simple procedure calls, while preserving memory isolation. We present dIPC (for "direct IPC"), an OS extension that repurposes and extends the CODOMs architecture to allow threads to cross process boundaries. It maps processes into a shared address space, and eliminates the OS kernel from the critical path of inter-process communication. dIPC is 64.12× faster than local remote procedure calls (RPCs), and 8.87× faster than IPC in the L4 microkernel. We show that applying dIPC to a multi-tier OLTP web server improves performance by up to 5.12× (2.13× on average), and reaches over 94% of the ideal system efficiency.We thank Diego Marr´on for helping with MariaDB, the anonymous reviewers for their feedback and, especially, Andrew Baumann for helping us improve the paper. This research was partially funded by HiPEAC through a collaboration grant for Lluís Vilanova (agreement number 687698 for the EU’s Horizon2020 research and innovation programme), the Israel Science Fundation (ISF grant 769/12) and the Israeli Ministry of Science, Technology and Space.Peer ReviewedPostprint (author's final draft

    High throughput determination log Po/w/pKa/log Do/w of drugs by combination of UHPLC and CE methods

    Get PDF
    n 1997Valkó et al. developed a generic fast gradient HPLC method, based on the calculation of the Chromatographic Hydrophobicity Index (CHI) from the gradient retention times, in order to measure lipophilicity. We have employedthe correlations between CHI and log Po/wand adapted the rapid gradient HPLC method to UHPLC obtaining excellent resolution and repeatability in a short analysis time (<4min). log Po/wvalues can be easily obtained from these CHI measurementsbut, unfortunately, these correlations are only valid for non-ionized compounds. Consequently, in order to determine the effective log Po/wvalue at a particular pH, afast high-throughput method for pKadeterminationwas required. The IS-CE method, based on the use of internal standards (IS) and capillaryelectrophoresis(CE),is a fast and attractive alternative to other methods for pKadetermination,since itoffers multiple advantages compared to them: low amounts of test compounds and reagents are needed, high purity is not required, specific interactions between test compounds and buffers are corrected, etc. In addition, it allows the determination of a pKavalue in less than 5 minutes. Both CHI and IS-CE have beencombined in order to describe a high throughput alternative in thedetermination ofthe lipophilicity profiles of bioactive compounds

    Novel instrument for automated pKa determination by internal standard capillary electrophoresis

    Get PDF
    The internal standard capillary electrophoresis method (IS-CE) has been implemented in a novel sequential injection−capillary electrophoresis instrument for the high-throughput determination of acidity constants (pKa) regardless of aqueous solubility, number of pKa values, or structure. This instrument comprises a buffer creation system that automatically mixes within a few seconds four reagents for in situ creation of the separation electrolyte with a pH range of 2−13, ionic strength of 10−100 mM and organic solvent content from 0% to 40%. Combined with 1.2 kV/cm and a short effective length (15 cm to the UV detector) fast 20 s electrophoretic separations can be obtained. The low standard deviation of the replicates and the low variation compared to reference values show that this system can accurately determine acidity constants of drugs by IS-CE. A single pKa can be determined in 2 min and a set of 20 measurements in half an hour, allowing rapid, simple, and flexible determination of pKa values of pharmaceutical targets

    Quantitative Metabolomics and Instationary 13C-Metabolic Flux Analysis Reveals Impact of Recombinant Protein Production on Trehalose and Energy Metabolism in Pichia pastoris

    Get PDF
    Pichia pastoris has been recognized as an effective host for recombinant protein production. In this work, we combine metabolomics and instationary 13C metabolic flux analysis (INST 13C-MFA) using GC-MS and LC-MS/MS to evaluate the potential impact of the production of a Rhizopus oryzae lipase (Rol) on P. pastoris central carbon metabolism. Higher oxygen uptake and CO2 production rates and slightly reduced biomass yield suggest an increased energy demand for the producing strain. This observation is further confirmed by 13C-based metabolic flux analysis. In particular, the flux through the methanol oxidation pathway and the TCA cycle was increased in the Rol-producing strain compared to the reference strain. Next to changes in the flux distribution, significant variations in intracellular metabolite concentrations were observed. Most notably, the pools of trehalose, which is related to cellular stress response, and xylose, which is linked to methanol assimilation, were significantly increased in the recombinant strain

    ecoHMEM: Improving object placement methodology for hybrid memory systems in HPC

    Get PDF
    Recent byte-addressable persistent memory (PMEM) technology offers capacities comparable to storage devices and access times much closer to DRAMs than other non-volatile memory technology. To palliate the large gap with DRAM performance, DRAM and PMEM are usually combined. Users have the choice to either manage the placement to different memory spaces by software or leverage the DRAM as a cache for the virtual address space of the PMEM. We present novel methodology for automatic object-level placement, including efficient runtime object matching and bandwidth-aware placement. Our experiments leveraging Intel® Optane™ Persistent Memory show from matching to greatly improved performance with respect to state-of-the-art software and hardware solutions, attaining over 2x runtime improvement in miniapplications and over 6% in OpenFOAM, a complex production application.This paper received funding from the Intel-BSC Exascale Laboratory SoW 5.1, the European Union’s Horizon 2020 research and innovation program under the Marie Sklodowska-Curie grant agreement No. 749516, the EPEEC project from the European Union’s Horizon 2020 research and innovation program under grant agreement No 801051, the DEEP-SEA project from the European Commission’s EuroHPC program under grant agreement 955606, and the Ministerio de Ciencia e Innovacion—Agencia Estatal de Investigación (PID2019-107255GB-C21/AEI/10.13039/501100011033).Peer ReviewedPostprint (author's final draft

    Prognostic Utility of a New Risk Stratification Protocol for Secondary Prevention in Patients Attending Cardiac Rehabilitation

    Get PDF
    Several risk scores have been used to predict risk after an acute coronary syndrome (ACS), but none of these risk scores include functional class. The aim was to assess the predictive value of risk stratification (RS), including functional class, and how cardiac rehabilitation (CR) changed RS. Two hundred and thirty-eight patients with ACS from an ambispective observational registry were stratified as low (L) and no-low (NL) risk and classified according to exercise compliance; low risk and exercise (L-E), low risk and control (no exercise) (L-C), no-low risk and exercise (NL-E), and no-low risk and control (NL-C). The primary endpoint was cardiac rehospitalization. Multivariable analysis was performed to identify variables independently associated with the primary endpoint. The L group included 56.7% of patients. The primary endpoint was higher in the NL group (18.4% vs. 4.4%, p < 0.001). After adjustment for age, sex, diabetes, and exercise in multivariable analysis, HR (95% CI) was 3.83 (1.51-9.68) for cardiac rehospitalization. For RS and exercise, the prognosis varied: the L-E group had a cardiac rehospitalization rate of 2.5% compared to 26.1% in the NL-C group (p < 0.001). Completing exercise training was associated with reclassification to low-risk, associated with a better outcome. This easy-to-calculate risk score offers robust prognostic information. No-exercise groups were independently associated with the worst outcomes. Exercise-based CR program changed RS, improving classification and prognosis
    corecore